Pulse oximetry values from 33,080 participants in the Apple Heart & Movement Study

Wearable devices that include pulse oximetry (SpO2) sensing afford the opportunity to capture oxygen saturation measurements from large cohorts under naturalistic conditions. We report here a cross-sectional analysis of 72 million SpO2 values collected from 33,080 individual participants in the Apple Heart and Movement Study, stratified by age, sex, body mass index (BMI), home altitude, and other demographic variables. Measurements aggregated by hour of day into 24-h SpO2 profiles exhibit similar circadian patterns for all demographic groups, being approximately sinusoidal with nadir near midnight local time, zenith near noon local time, and mean 0.8% lower saturation during overnight hours. Using SpO2 measurements averaged for each subject into mean nocturnal and daytime SpO2 values, we employ multivariate ordinary least squares regression to quantify population-level trends according to demographic factors. For the full cohort, regression coefficients obtained from models fit to daytime SpO2 are in close quantitative agreement with the corresponding values from published reference models for awake arterial oxygen saturation measured under controlled laboratory conditions. Regression models stratified by sex reveal significantly different age- and BMI-dependent SpO2 trends for females compared with males, although constant terms and regression coefficients for altitude do not differ between sexes. Incorporating categorical variables encoding self-reported race/ethnicity into the full-cohort regression models identifies small but statistically significant differences in daytime SpO2 (largest coefficient corresponding to 0.13% lower SpO2, for Hispanic study participants compared to White participants), but no significant differences between groups for nocturnal SpO2. Additional stratified analysis comparing regression models fit independently to subjects in each race/ethnicity group is suggestive of small differences in age- and sex-dependent trends, but indicates no significant difference in constant terms between any race/ethnicity groups for either daytime or nocturnal SpO2. The large diverse study population and study design employing automated background SpO2 measurements spanning the full 24-h circadian cycle enables the establishment of healthy population reference trends outside of clinical settings.


Figure Legends
.
Supplementary Figure 3: Linear regression R 2 and model coefficients produced by fitting M 1 using subject-mean SpO 2 aggregated for each individual hour of the day: (a) Fitted R 2 is highest during typical sleep hours (approx 22:00-06:00) compared with daytime hours. Age (c), BMI (d) and altitude (f) coefficients exhibit clear circadian variation and have greatest absolute magnitude during typical sleep hours. (e) Biological sex and race/ethnicity group (g-j) also exhibit a small degree of diurnal variation. In all plots, error whiskers correspond to 99.5% confidence intervals.
Supplementary Figure 4: Comparison of regression coefficients for M 1 models fit to the full cohort using SpO 2 measurements from all available dates for each subject ('Original Cohort, Full Study Window'), for a subset of subjects with SpO 2 measurements limited to a maximum timespan of 30 calendar days ('Rolling 30d Cohort, 30d Data Window'), and for the same subset of subjects using SpO 2 measurements from all available dates ('Rolling 30d Cohort, Full Study Window'), for daytime SpO 2 (a-d, top row) and nocturnal SpO 2 (e-h, bottom row). Error bars represent 99.5% confidence intervals for the fitted coefficients. Race/Ethnicity variables are omitted for clarity. Plotted coefficients and confidence intervals are identical to the values listed in Supplementary  Table 10.
Supplementary Figure 5: Comparison of regression coefficients for M 1 models fit for subjects stratified by self-reported health conditions and smoking habits, for daytime SpO 2 (a-d, top row) and nocturnal SpO 2 (e-h, bottom row). Error bars represent 99.5% confidence intervals for the fitted coefficients. Race/Ethnicity variables are omitted for clarity. Plotted coefficients and confidence intervals are identical to the values listed in Supplementary Table 11.
Supplementary Figure 6: Linear relationships between mean day-night SpO 2 difference (dn∆SpO 2 ) and the three independent variables exhibiting the strongest correlation with these metrics: age (a), BMI (b) and estimated home altitude (c). Each plot presents a 2-dimensional histogram of values from all 33,080 subjects in evenly-spaced hexagonal bins, with the color density corresponding to log-scaled bin counts for clarity. Positive values for dn∆SpO 2 correspond to an overnight drop in measured SpO 2 . In each plot, the overlaid red line represents the simple univariate linear regression fit using the independent variable shown on the x-axis. The listed slope and Pearson correlation coefficient correspond to the same univariate linear fit. Figure 7: Histograms of nocturnal SpO 2 for Black and White subjects, after linear adjustment of age to a target of 40 years, BMI to a target of 25.0 kg/m 2 , and home altitude to zero elevation. Distributions are shown for the full range of nocturnal SpO 2 (a) and for the lowsaturation range (b). Data for both plots is identical, with the only the axes limits differing. The distributions do not differ with statistical significance (p > .05) based on two-sample Kolmogorov-Smirnov test using two-sided alternative hypothesis, either over the full range of SpO 2 or if the distributions are clipped at 94% saturation to emphasize the hypoxic tail.

Supplementary Note 1 (Variance Analysis)
To investigate the magnitude of subject-to-subject variability in dSpO 2 and nSpO 2 compared to day-to-day variability within subjects, we prepared a dataset consisting of per-date dSpO 2 and nSpO 2 by averaging SpO 2 values in 24-hour time windows for each subject, using the same clock hours as described in the Methods to delineate daytime and nocturnal measurements. We excluded individual subject-dates containing only a single daytime or nocturnal SpO 2 measurement, representing 1.3% of total subject-dates. This yielded an array of 3.41M per-date dSpO 2 values and 3.71M per-date nSpO 2 values for downstream variance analysis via nested one-way ANOVA and variance components analysis (VCA).
Nested one-way ANOVA results for dSpO 2 and nSpO 2 are summarized in Supplementary Tables 1 and 2. For both daytime and nocturnal measurements, the between-subjects variance is highly significant (F-test p < 1.0e-10), with η 2 values of 0.365 and 0.539 for dSpO 2 and nSpO 2 , respectively.  Using the same set of per-date dSpO 2 and nSpO 2 values for each subject, we also performed nested VCA using a mixed-effects linear model with random subject intercepts, fitted via random effects maximum likelihood (REML). Models were fit separately for dSpO 2 and nSpO 2 to yield estimates for variance between subjects, date-by-date variance within subjects, and residual variance. VCA results are summarized in Supplementary Tables 3 and 4. For both daytime and nocturnal SpO 2 , VCA results support the conclusion that the predominant contributor to daily measurement variance is subject-to-subject differences. Results of VCA are in close quantitative agreement with nested ANOVA results regarding the fraction of variance attributed to subject effects. Additionally, the relative variance contribution from subject variance components is greater for nocturnal SpO 2 compared to daytime SpO 2 , a finding that is in agreement with our observation of consistently higher linear model fit    Table 5) shows quantitative agreement between the difference in male-only and female-only M 1,sex Age, BMI and altitude regression regression coefficients and the corresponding interaction coefficients of M 1,sex−interaction to within a factor of 10 −6 . Additionally, all statistically significant coefficients for interaction terms in the fitted model match the covariates that were identified via stratified analysis as having a significant sex-dependent difference.
Additional linear regression models incorporating interaction terms for race/ethnicity groups and for combined sex and race/ethnicity groups were investigated, but produced inferior goodness of fit (as determined by BIC) compared to model M 1 . Analysis of group differences based on race/ethnicity was accomplished via stratified analysis using M 1 as described in the 'Statistical Analysis' section of the main document.

Supplementary Note 3 (SpO 2 Circadian Alignment Using Sleep Tracking Data)
A significant fraction (N=21633, 65%) of study subjects provided nightly sleep tracking data sufficiently overlapping their SpO 2 measurement dates to enable circadian alignment using measured sleep times. For these subjects ('Sleep Cohort'), the timestamp for each SpO 2 measurement was adjusted to represent hours since the most recent start of a reported sleep session (if sleep data was available within the preceding 24 hours). SpO 2 measurements not preceded by a sleep session within the preceding 24 hours were discarded. Using the new measurement timestamps (now corresponding to 00:00 at the start of each night's sleep rather than 00:00 at midnight local clock time), the 24h mean circadian profiles and dSpO 2 and nSpO 2 were then recalculated for each subject in the Sleep Cohort.
The distribution of all nightly sleep start times relative to midnight clock time are shown in Supplementary Figure 1a. Mean nightly sleep start times for the Sleep Cohort were centered shortly before midnight (median start of sleep 23:35 local clock time, IQR 107 minutes). 24-hour circadian profiles for the original cohort and sleep cohort aligned closely (Supplementary Figure 1b), with dSpO 2 and nSpO 2 differing between sleep-aligned and original values by mean ± standard deviation of 0.06 ± 0.30% and -0.04 ± 0.27%, respectively. Scatterplots for the original clock-aligned and sleep-aligned dSpO 2 and nSpO 2 for all subjects in the Sleep Cohort are shown in Supplementary  Figures 1c and 1d.
Sleep-aligned dSpO 2 and nSpO 2 values from the Sleep Cohort were fit using proposed regression model M 1 , and the regression coefficients compared against models fit using the original clockaligned dSpO 2 and nSpO 2 values from the Sleep Cohort as well as the original clock-aligned values from the full study population. No regression coefficients differed meaningfully for between models fit to the full study population compared to the Sleep Cohort, or for models fit using the sleep cohort using clock-aligned compared to sleep-aligned SpO 2 measurements. A comparison of fitted regression coefficients is shown in Supplementary Figure 2.    Regression coefficients for (e) assigned sex and (g-j) race/ethnicity group also exhibit a small degree of diurnal variation. In all plots, error whiskers correspond to 99.5% confidence intervals. R 2 = coefficient of determination.

Supplementary Note 4 (Influence of Reduced Data Timespan on Regression Results)
As described in the Methods section of the primary document, dSpO 2 and nSpO 2 values for each subject used in the regression modeling analysis were calculated by averaging all daytime and nocturnal measurements captured over the full timespan of the data set (up to 37 weeks per subject). To evaluate the impact of reducing the data timespan, we separately calculated dSpO 2 and nSpO 2 while limiting the total data timespan to a maximum of 30 calendar days for each subject. Many subjects in the data set did not produce SpO 2 on every calendar day, so this necessitated the use of a rolling 30-day date window for each subject, with dSpO 2 and nSpO 2 calculated from the most recent 30-day date window in which each subject satisfied the data coverage requirements described in the Methods section. A total of 29,556 subjects ('Rolling 30d Cohort') satisfied this requirement, with the remaining 3,524 subjects failing to meet daytime and nocturnal measurement count requirements within any 30-day date window.
Model M 1 was fit using dSpO 2 and nSpO 2 values for the Rolling 30d Cohort calculated from 30 days of data per subject, as well as for the same cohort using the full study timespan. Regression coefficients for the fitted models are compared with those of the full subject cohort in Supplementary  Figure 4, and tabulated in Supplementary

Supplementary Note 5 (Influence of Chronic Health Conditions and Smoking Habits on Regression Results)
To analyze whether the systematic SpO 2 trends measured with respect to Age and BMI may arise due to accumulation of chronic lung disease or use of cigarettes, we fit model M 1 after grouping subjects based on self report of chronic lung disease and cigarette use. From the full cohort of 33,080 subjects, 980 subjects (3.0%) were assigned to a 'Current Smokers' cohort based on self-report of current daily cigarette use (irrespective of self-reported health conditions). 1,015 subjects (3.1%) were assigned to an 'Unhealthy' cohort based on self-report of chronic bronchitis, chronic obstructive pulmonary disease, emphysema, or any form of heart failure. 21,631 subjects (65.4%) were assigned a 'Healthy Lifetime Nonsmokers' cohort based on self-report of no historic or current cigarette use and no self-report of chronic bronchitis, chronic obstructive pulmonary disease, emphysema, or any form of heart failure.
Model M 1 was fit using dSpO 2 and nSpO 2 values for each health condition/smoking history cohort. Regression coefficients for the fitted models are compared with those of the full subject cohort in Supplementary Figure 5, and tabulated in Supplementary Table 11. Current smokers and subjects with cardiopulmonary disease exhibit significantly lower nSpO 2 intercept terms and significantly greater decline in both dSpO 2 and nSpO 2 with Age, compared with healthy lifetime nonsmokers. Comparison of M 1 fit coefficients from the full cohort with fit coefficients from healthy lifetime nonsmokers shows equivalent decline in both SpO 2 with increasing Age and BMI both groups.   Figure 7: Histograms of nocturnal SpO2 for Black and White subjects, after linear adjustment of age to a target of 40 years, BMI to a target of 25.0 kg/m 2 , and home altitude to zero elevation. Distributions are shown for the full range of nocturnal SpO2 (a) and for the low-saturation range (b). Data for both plots is identical, with the only the axes limits differing. The distributions do not differ with statistical significance (p > .05) based on two-sample Kolmogorov-Smirnov test using two-sided alternative hypothesis, either over the full range of SpO2 or if the distributions are clipped at 94% saturation to emphasize the hypoxic tail.